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Abstract. This note provides several recent progresses in the study of long time behavior of Markov processes. 
The examples presented below are related to other scientific fields as PDE’s, physics or biology. The involved 
mathematical tools as propagation of chaos, coupling, functional inequalities, provide a good picture of the classical 
methods that furnish quantitative rates of convergence to equilibrium. 

Resume. Cet article presente plusieurs progres recents dans Fetude du comportement en temps long de certains 
processus de Markov. Les exemples presentes ci-dessous sont motives par differentes applications issues de la 
physique ou de la biologie. Les outils mathematiques employes, propagation du chaos, couplage, inegalites fonc- 
tionnelles, couvrent un large spectre des techniques disponibles pour obtenir des comportements en temps long 
quantitatifs. 


Introduction 

This note gathers several progresses in the study of the long time behavior of Markovian (and non Markovian) pro¬ 
cesses. The first section is dedicated to the study of stochastic differential equation driven by a fraction Brownian motion 
with a non constant diffusion matrix. This process is not Markovian but one can successfully adapt the coupling strategy 
to get quantitative long time estimates. In the second section, a piecewise deterministic Markov process arising linked 
to a stochastic algorithm is studied thanks to clever couplings of the paths. The last two sections stress the fruitful links 
between mean field interacting particle systems and non linear parabolic partial differential equations. 

1. Rate of convergence to equilibrium for fractional SDEs 

This part, which is a short version of [11], is devoted to the problem of the estimation of the rate of convergence to 
equilibrium of stochastic differential equations (SDEs) driven by a fractional Brownian motion (ffim). In this highly 
not Markovian setting, this problem has been first investigated by Hairer [17] who proved that, in the additive setting, 
the rate of convergence in total variation can be upper-bounded by Ct~ PH where p H is a positive number depending on 
the Hurst parameter H of the ffim. But to our knowledge, there is no result in the multiplicative setting. Hence, we 
focus on the generalization of the existing additive results to the multiplicative case and obtain an extension of [17] when 
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H > 1/2. The main novelty of this work is a development of some Foster-Lyapunov techniques in this non-Markovian 
setting, which allows us to put in place an asymptotic coupling scheme as that of [17] without resorting to deterministic 
contracting properties. 

1.1. Introduction 

We deal with an Revalued process (X t )t >o which is a solution to the following SDE 

dX t = b(X t )dt + a{X t )dB t (1) 

where b : R d —> R d and a : R d —> Rl d.d are (at least) continuous functions, and where is the set of d x d real 
matrices. In (1), (B t )t >o is a d-dimensional fBm with Hurst parameter H £ (|,1). Note that under some Holder 
regularity assumptions on the coefficients (see e.g. [31] for background), (strong) existence and uniqueness hold for the 
solution to (1) starting from xq £ R d . 

Introducing the Mandelbrot-Van Ness representation of the fractional Brownian motion, 

rO 

B t =a H (-r) H -HdW r+t ~dW r ), t > 0, (2) 

J — OO 

where (Wt)te r is a two-sided R -valued Brownian motion and o// is a normalization coefficient depending on li, 
(X t . (B s+t )s<o)t>o can be realized through a Feller transformation (Qt)t >o on the product space R d x W where W 
is a suitable Holder-type space on (—oo, 0] (we refer to [18] for more rigorous background on this topic). In particular, an 
initial distribution of this dynamical system is a distribution //o on R d x IV. With probabilistic words, an initial distribution 
is the distribution of a couple (Xo, (B s ) s < o) where (B s ) s <o is an Revalued fBm on (—oo, 0]. 

Then, such an initial distribution is classically called an invariant distribution if it is invariant by the transformation 
Qt for every t > 0. However, the concept of uniqueness of invariant distribution is slightly different from the classical 
setting. Actually, if Qp stands for the distribution of the whole process (X^) t >o with initial distribution p, one says that 
uniqueness of the invariant distribution holds if the stationary regime is unique (in other words, this concept of uniqueness 
corresponds to the classical one conditioned by the equivalence relation: /i ~ v •<=>• Qp ~ Qv, see [18] for background). 
We refer to [17-19] for criteria of uniqueness in different settings: additive noise, multiplicative noise with H > 1/2 and 
multiplicative noise with H £ (1/3,1/2) in the last one (in an hypoelliptic context). 

The additive result of [17] is obtained by a coupling strategy that we briefly recall here. Classically, coupling two paths 
issued of po and //. where the second one denotes an invariant distribution of (Qt)t> o consists in finding a stopping time 
Too such that (Xt+ T )t >o = (X t /J +r ^)t>o (so that the rate of convergence in total variation can be derived from some 
bounds on P(too > t), t > 0). Now, let us detail the strategy. First, one classically waits that the paths get close. Then, 
at each trial, the coupling attempt is divided in two steps. First, one tries in Step 1 to stick the positions on an interval of 
length 1. Then, in Step 2, one tries to ensure that the paths stay stuck until +oo. Actually, oppositely to the Markovian 
case where the paths stay naturally together after a clustering (by putting the same noise on each coordinate), the main 
difficulty here is that, due to the memory, staying together is costly. In other words, this property can be ensured only with 
the help of a non trivial coupling of the noises. One thus talks of asymptotic coupling. If one of the two previous steps 
fails, we will begin a new attempt but only after a (long) waiting time which is called Step 3. During this step, one again 
waits that the paths get close but one also expects the memory of the coupling cost to vanish sufficiently in order to begin 
the new trial with a weak weight of the memory. 

In the previous construction, the fact that er is constant is fundamental for ensuring the two following properties: 

• If two paths B 1 and B 2 of the fBm differ from a drift term, then two paths X 1 and X 2 of ( 1) respectively directed 
by B 1 and B 2 also differ from a drift term, which allows in particular to use Girsanov Theorem to build the 
coupling in Step 1. 

• Under some “convexity” assumptions on the drift apart from a compact set, two paths X 1 and X 2 directed by the 
same fBm (or more precisely, by two slightly different paths) get closer and the distance between the two paths 
can be controlled deterministically. 
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In the current work, er is not constant and the two above properties are no longer valid. The challenge then is to extend 
the applicability of the previous coupling scheme to such a situation. The replacement of each of the above properties 
requires us to deal with different (though related) difficulties. In order to be able to extend the Girsanov argument used 
in Step 1 to a non constant a, we will restrain ourselves to diffusion coefficients for which some injective function of two 
copies of the process differs by a drift term whenever their driving fBm do. A natural assumption on a granting the latter 
property is that x H > cr~ 1 (x) is (well-defined and is) a Jacobian matrix. This will be the setting of the present paper. 

As concerns a suitable substitution of the second lacking property, a natural (but to our knowledge so far not explored) 
idea is to try to extend Meyn-Tweedie techniques (see e.g. [10] for background) to the fractional setting. More precisely, 
even if the paths do not get closer to each other deterministically, one could expect that some Lyapunov assumption could 
eventually make the two paths return in some compact set simultaneously. The main contribution of the present paper is 
to incorporate such a Lyapunov-type approach into the study of long-time convergence in the fractional diffusion setting. 
As one could expect, compared to the Markovian case, the problem is much more involved. Actually, the return time to 
a compact set after a (failed) coupling attempt does not only depend on the positions of the processes after it, but also 
on all the past of the fBm. Therefore, in order that the coupling attempt succeeds with lower-bounded probability, one 
needs to establish some controls on the past behavior of the fBms that drive the two copies of the process, conditionally 
to the failure of the previous attempts. This point is one of the main difficulties of the paper, since, in the corresponding 
estimates, we carefully have to take into account all the deformations of the distribution that previously failed attempts 
induce. Then, we show that after a sufficiently long waiting time, conditionally on previous fails the probability that the 
two paths be in a compact set and that the influence of past noise on the future be controlled, is lower-bounded. Bringing 
all the estimates together yields a global control of the coupling time and a rate of convergence which is similar to the one 
in [17] in the additive noise case. 

1.2. Assumptions and Main Result 

Remind that in the whole section it is assumed that H £ (1/2,1). We begin by a condition for the existence and 
uniqueness of solutions for ( 1 ): 

(H 0 ): b is a locally Lipschitz and sublinear function and a is a bounded (1 + 7 )-Lipschitz continuous function with 
7 £ (i — 1,1] (i.e. <7 is a (^-function whose partial derivatives are bounded and globally 7 -Holder-continuous). 

Before introducing the second assumption, let us give a definition. We say that a function V : lR d —> IR is essentially 
quadratic if it is a positive (^-function such that VI 7 is Lipschitz continuous and such that 

liminf > 0 and |W| < CW (C > 0). 

| x | —^00 \X\ Z 

In order to ensure the existence of the invariant distribution, we now introduce a Lyapunov-stability assumption (Hi) 
through such a function V: 

(Hi): There exists an essentially quadratic function V : —> IR, there exist some positive 3q and kq such that 

V 2 ; £ 0R d , {VV(x)\b(x)) < fa - k 0 V{x). 

Remark 1.1 (Comparison to the Markovian case). For the coupling strategy, the above assumption will be certainly used 
to ensure that the paths live in a compact set with a high probability. Note that in the classical diffusion setting, such 
a property holds with some less restrictive Lyapunov assumptions. Here, the assumptions (Ho) and (Hi) essentially 
allow us to consider only (attractive) drift terms whose growth is linear at infinity. This more restrictive condition can be 
understood as a consequence of the lack of martingale property for the integrals driven by fBms, which leads in fact to a 
more important contribution of the noise component. 

Remark 1.2 (Comparison to previous results). In [17], the corresponding assumption is a contraction condition out of a 
compact set: for any x,y, (b(x) — b(y)\x — y) < (3$ — kq\x — y\ 2 . This means that even in the constant case, ourworkcan 
cover some new cases. For instance, if d = 2 and b(z) = —z — pcos(9 z )z ± (where p £ IR, 9 Z is the angle of z and z 1 - is 
its normal vector). Assumption (Hi) holds whereas one can check that the contraction condition is not satisfied if p > 2. 
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When the paths are in this compact set, one tries classically to couple them with positive probability. But, as mentioned 
before, the specificity of the non-Markovian setting is that the coupling attempt generates a cost for the future (in a sense 
made precise later). In order to control this cost or more precisely in order to couple the paths with the help of a controlled 
drift term, we need to ensure the next assumption: 

(H 2 ) Vx £ IR d , a( x) is invertible and there exists a (^-function h = (hi ,..., hd ) : lR d —> IR d such that the Jacobian 
matrix V/i = (d Xd hi)ij e ri__ d \ satisfies S7h(x) = a~ 1 (x) and such that V/i is a locally Lipschitz function on IR d . 

Remark 1.3 (On the regularity of the diffusion matrix). Under (Ho) and (H 2 ), h is a global C 1 -diffeomorphism from 
IRd to lodged unc Jer these assumptions, X/h is invertible everywhere and x 1 —>■ [(Vft.)(a ;)] _1 = a(x) is bounded on 
IR . Then, the property (which will be important in the sequel), follows from the Hadamard-Levy theorem (see e.g. [35]). 

As mentioned before, the main restriction here is to assume that x a~ 1 (x) is a Jacobian matrix. However, note 
that there is no assumption on h (excepted smoothness). In particular, er -1 does not need to bounded. This allows us to 
consider for instance some cases where er vanishes at infinity. 

Let us exhibit some simple classes of SDEs for which (H 2 ) is fulfilled. First, it contains the class of non-degenerated 
SDEs for which each coordinate is directed by one real-valued fBm. More precisely, if for every i £ (1...., d}, 

dX\ = bi(Xl ,..., Xf)dt + Xf)dB\ 

where er, : IR d —> IR is a C 1 positive function. Assumption (H 2 ) holds. Now, let us also remark that since, for a given 
constant matrix, V(P/i) = PV/i, we have the following equivalence: 

3 h such that X/h = <j _1 3 A, 3 an invertible matrix P such that er -1 = PV/i, 

One deduces from this property that (H 2 ) also holds true if: 

a(x) = PBiag(ai(x 1 ,...,x d ),...,a d (x 1 ,...,x d )) 

where P is a given invertible d x (-/-matrix and for every i £ {1 ....,<:/} a, has the same properties as before. 

We are now able to state our main result. One denotes by C((Xf°) t > 0 ) the distribution of the process on the set 
C([ 0 , + 00 ), IR^) starting from an initial distribution po and by Qp the distribution of the stationary solution (starting from 
an invariant distribution /i). The distribution po(dx) denotes the first marginal of po(dx, dw). 

Theorem 1.4. Let H £ (1/2,1). Assume (Ho), (Hi) and (H 2 ). Then, existence and uniqueness hold for the invariant 
distribution p (up to equivalence). Furthermore, for every initial distribution po for which there exists r > 0 such that 
J \x\ r fo(dx) < 00 , for each e > 0 there exists C e > 0 such that 

IIAW.)«>0) - 2mIItV < 

Remark 1.5. In the previous result, the main contribution is the fact that one is able to recover the rates of the additive 
case. Existence and uniqueness results are not really new. However, compared with the assumptions of [18], one observes 
that when x 1 —> a~ l (x) is a Jacobian matrix (assumption which does not appear in [18]), our other assumptions are slightly 
less constraining. In particular, b is assumed to be locally Lipschitz and sublinear (instead of Lipschitz continuous) and, 
as mentioned before, x 1 —> o~ 1 (x) does not need to bounded. 

Some ingredients of the proof. The aim of this section is to give some ideas of the proof. As explained above, the scheme 
is similar to that of [17], The starting point is to consider a couple ( X , X) of solutions to (1) with respective driving fBms 
denoted by B and B. The underlying innovation processes are denoted by W and W. For the sake of simplicity, assume 
that Xq = x £ [R r/ and that the initial condition of X is the invariant distribution //. For every k > 1, denote by Tk-x, the 
starting time of the k th -coupling attempt and by A 77 , its duration. If the coupling is successful, X and X get stuck after 
Step 1, i.e. Tk -1 + 1. We thus define r M := Tk*~ 1 + 1 where k* := infjfc > 1, At*, = + 00 }. By construction, 


Vt > 0, ||£((A t 7J s > 0 ) - Qp ||tv < P(too > t). 
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Furthermore, using that 

Wt > 1, P(tqo > t - 1) = P(r 0 + ^2 Arklk*>k > t), 

k =1 

it can be shown (see [11], section 5) that the problem can be more or less reduced to the control (uniform in k) of : 

P(Ar fc <+oo|Ar fc _i <+oo, and E[|Ar fc | p |Ar fc , X Tk _ x ] 


+oo 


where (J r t)t>o is the usual augmentation of (( a(W s , Ws))«<t)t>o and P G (0,1) is a real that one will try to maximize 
(as suggested by Theorem 1.4, this expectation will be finite as soon as p < 1/8). 

(A', a)-admissibility. The quantities defined previously can be controlled only if the positions of each component and the 
past of their noise satisfy some conditions at the beginning of the attempt. One will talk about (AT, a)-admissibility. In 
order to define this concept, we now assume that W and W differ from a drift term denoted by g w : dWt = dWt T g w ( t)dt. 
The function g w will be supposed to be null before to, i.e. before the first attempt and also during Step 3. In order to 
quantify the impact of g w on the future attempts, one introduces an operator TZt defined (when it makes sense) by: for all 
T > 0 and for all g : IR —> IR 


(n T g)(t) = f 

J —c 


t + T — s 


9 (s)ds, 


t £ (0, Too). 


The admissibility condition can be then defined as follows: 

Definition 1.6. Let K and a be some positive numbers and r be a (Tf ) f > 0 -stopping time. One says that the system is 
(A', a)-admissible at time r if r(w) < Too and if (A/(tu), (W /1 (w), W 2 (oj)) t < T ) satisfies : 

r+oo 

sup/ (1 + t) 2a \(K T gl,)(t)\ 2 dt < 1 where g T w (.) = g w (. + T ) (3) 

T>oJo 


is the shift of drift term between W and W, and if (A/(w), X 2 (co), (M /1 (u;), W 2 (uj )) t < T ) 6 fl K,a,r where 

n K , a ,r := {|^ r 1 (tu)| < K, \X 2 (u)\ < K, ^ Ee (W\u;)) < K and ^ r , e „ (W 2 (lu)) < K}, (4) 

and eg = H ~ e with 9 £ (1/2, H) and for all e > 0, 


VtA w ) = SU P 

T<S<t<T +1 


r r -1 


t — S , 


(t — r) H 2 — (s — r) H 2 dw : 


sup 


|u:(u) — w(u)| 


r—l<n<v<r \V — U 2 


The (AT, a)-admissibility is a sufficient condition to ensure that the coupling succeeds with lower-bounded probability 
(i.e. such that the paths remain stuck until infinity). More precisely, under this condition, one is able to show that there 
exists <5 q > 0 such that for every k > 1, 


P(Ar fc = Too|{A Tfc _ 1 < Too} (T > Jo- 


Condition (3), which comes from [17], is adapted for Step 2, ensuring that if X Tk _ 1+ i = X Tk _ 1+ i (i.e. if Step 1 succeeds), 
one can build some couples (W, W) on successive intervals of lengths 2 aN such that the probability that the paths remain 
stuck until infinity is lower-bounded. The second condition plays an important role for Step 1. When u £ Qr a,r k -i> 
the cost to stick the paths can be uniformly controlled under Assumption (H 2 ). Then, one of the main difficulties of the 
proof in the multiplicative case is to show that the probability of (AT, a)-admissibility is also lower-bounded: one needs 
to show that there exists Ji > 0 such that for every k > 1, 


P(^AT,a,r fc |ATfc_i < Too) > <5i- 
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The proof of this property is achieved in two main steps. In the first one, one needs to control the increments of W and W 
before Tfc, conditionally to the failure of the previous attempts. This control requires a sharp knowledge of the construction 
of the innovations used in Steps 1 and 2 in order to identify the distortions generated by each scenario of failed attempt. 
Then, plugging these controls into the Mandelbrot-Van Ness representation allows us to estimate the impact of these 
distortions on the future of the increments of the fBm (see Lemmas 4.5, 4.6 et 4.7 of [11] for more details). 

With the help of the Lyapunov assumption (Hi), one tries in the second step to upper-bound the quantity E[V r (X Tk ) + 
V r (X Tk )\A Tfc < Too] (for a positive power r). Conditionally to the first step, the keypoint is the following contraction 
property (Proposition 4.4 of [11]) : if (H 0 ) and (Hi) hold, then, for every 9 G (1/2, H), there exists p G (0,1), C > 0 
and ?’ > 0 such that for every starting point x G lR d , 

V r (X 1 )<pV r (x)+C(l + \\B H \\° 9 ’ 1 ) where \\B\\ 0 /= sup "ff 1 . (5) 

o<s<t<i yt — s) u 

We refer to [11] for the complete proof. 

2. The penalized bandit process 

The two-armed bandit algorithm is a theoretical procedure to choose asymptotically the most profitable arm of a slot 
machine, or bandit; it was also used in the fields of mathematical psychology and of engineering. This algorithm has been 
widely studied, for instance in [27,29], The key idea is to use a (deterministic) sequence of learning rates, rewarding an 
arm if it delivers a gain. Depending on the speed of convergence to 0 of this sequence, the algorithm is often faillible (it 
would not always select asymptotically the right arm, see [29]). 

It is possible to improve its results and ensure infaillibility by introducing penalties when the arm does not deliver a 
gain: this modification is called the penalized bandit algorithm, and it is studied in [28], The authors show that, with a 
correct choice of penalties and rewards, and with the appropriate renormalization, the algorithm converges weakly to a 
probability measure 7r, which is the stationary distribution of the Piecewise Deterministic Markov Process with following 
infinitesimal generator 

rn \ 1 1 \r't \ L f( x + 9) - fix) 

‘--fix) = (l-p-px)f ( x ) +qx -. 

9 

where 0 < q < p < 1 being the respective probabilities of gain of the two arms and the positive parameter g runs the 
asymptotic behaviour of the sequences of the rewards and penalties (see Section 3 in [28] for more details). This process 
is also studied in [13], For the sake of simplicity, we set g = 1 in the sequel. Moreover, since the interval [0, (1 — p)/p) 
is transient, computations are easier if we study the translated process Y = X — —driven by the following generator: 

£ y . fiv) = -pyf'iv) + q(y + (/(y + 1 )- fiv)) ■ (6) 

It is possible to deduce the dynamics of the process from the generator (see [9]): between the jumps, Y evolves as the 
solution of the ODE y' t = —pyt, and it jumps with jump rate t H > ({Y t ) = q (Yt + from Y t to Y t + 1. 

In [28], the authors show that 7r admits a density with support [(1 — p)/p, +oo) and exponential moments of order up 
to um, where um is the unique positive solution of the equation 

exp(u M ) - 1 = p 

u M q' 

We shall prove the latter too, but with a different argument (see Remark 2.2). 

In the sequel, we denote by penalized bandit process the process with initial distribution go following the dynamics 
of £} , and by yt its law at time t > 0. This section is devoted to the long time behavior of this process with respect to 
Wasserstein and total variation distances. The estimates rely on the construction of explicit couplings. This approach is 
closely related to the paper [7] which is dedicated to the study of a Piecewise Deterministic Markov Process related to a 
pharmacokinetic model introduced in [2], 
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2.1. Wasserstein convergence 

Firstly, let us recall the definitions of Wasserstein and total variation distances between two measures: 

W n (n, v) = inf |e[|A' — Y\ n ]~ : ( X , Y) coupling of jj and i/|, 

\\fj, - j/||tv = inf {P(A Y) : ( X , Y) coupling of \± and v} . 

In the following, let mo and /i () be two probabilities on IR + . The following proposition holds: 

Proposition 2.1. We have, for all t > 0 

WiU./Zt) < W 1 (ji Q ,Ji Q )e-< p - q)t . (8) 


Proof. Let (Y, Y ) be generated by 

f{y » y) = - pydyfiy, y) - pydyfiy, y) 

+ q{y - y){f{y +1 ,y)~ f(y , y)) + q(y+ + f. y + f) - f(y, y)), (9) 


for y > y, and of symetric expression for y > y, and such that (To, To) is a coupling of (/zo, Mo) realizing Wi(/io> Mo) = 
|F 0 — Do| . It is easy to check that (9) reduces to (6) if / depends only on y or y, which means that (Y), Yt)t>o 


generated with C\ is a coupling of (jit, M*)t>o- With this coupling, either the higher process jumps alone or the jump 
is simultaneous. It is easy to check that this coupling is monotonous, i.e. for all t > 0, (Y t - Y t )(Y 0 - Y 0 ) > 0. 
Monotonicity comes from the fact that the higher process jumps more often but stays above the other since the jumps are 
positive. Assume that Y'q >Y 0 . By monotonicity, we have, for all t > 0, 




E[Kt] - EN, 


so all we have to do is to study h : t i —> E[Yj], With f(y) = y , (6) leads to Cf(y) = g ^ p — (p — q)y , and, by Dynkin 

formula, the function h satisfies the ordinary differential equation h'(t) = q ^ l ~ p ’ — (p — q)h(t ). One deduces immediately 
that 


EM 


g(l ~P) 

p(p - q) 



q( l ~P) \ e -(p-q)t 

p(p -q)J 


( 10 ) 


(recall that p > q). Then, 


\Yt~Y t \ 


= E 


Y 0 -Yo 




which leads directly to (8) 


□ 


Remark 2.2. The Dynkin formula is a powerful tool for studying the moments of Markov processes. One can use it with 
f(y) = e uy to study the Laplace transform of the transient process ip(t, u) = E[e“ y,f ]. We have 


& f(y) = Q— ^( e “ ~ f)/(y) + (?(e“ - 1) - up)yf(y), 


so ip satisfies the following PDE: 


d t ip(t,u) = q -—— (e“ - 1 )ip(t,u) + (q(e u 
P 


1) - up)d u ip(t, u). 
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If /io is the invariant measure n, then dti/)(t,u) = 0, and the Laplace transform u K > f>{u) is solution of the following 
ODE 

q—~(e u — 1) 

A(i°gWM)) = „ p : g(e ,_ 1) , 

and the right-hand side is finite for u £ [0, um), when um is the solution of Equation (7). 


Note that the set of polynomials of degree n is stable under the action of £} . This is an important property, since it 
theoretically enables us to compute the moments of Y t by induction, with the help of Dynkin formula, just as we did for 
the first momentin the proof of Proposition 2.1. Similarly, it is possible to study the function h n (t) = E[|Yj — Y t \ n ] which 
provides an upper bound of W n (pt, Jit). Indeed, we have, for f(y,y) = \y — y\ n , 


n—2 

£-1 f(y, y) = -n(p - q)\y - y\ n + q 

k= 0 
SO 

n—2 

h' n {t) = - n(p - q)h n {t ) + 

k=0 

Then, using Gronwall lemma easily leads to check, by induction, that h n (t) = 0(e~ n ^ p ~ q ' t ). Which leads to the 
following result: 

Proposition 2.3. For all n £ N*, there exists a positive constant C such that, for all f > 0, 





2.2. Total variation convergence 

In the case of the penalized bandit process, total variation convergence is slightly harder than in [7], since the jumps 
are always of amplitude range 1. Instead, we are going to use the arguments introduced in [1], based on the following 
observation: if Y and Y are close enough, we can make them jump, not simultaneously like before, but with a slight delay 
for one of the copies, which would make it jump on the other one, as illustrated in Figure 1 . 


T~ 



FIGURE 1. Expected behaviour of the coalescent coupling for the penalized bandit process. 


In the following, denote by r = inf {f > 0 : Vs > 0, Yj +S = Y t+S } the coalescence time of Y and Y. The goal of the 
sequel is to obtain exponential moments for r (which may happen for correct couplings) and then use the total variation 
classic coupling inequality: 

||IH - &||tv < P(r t rf Y t ) < P(t > t). 

We have the following lemma: 

Lemma 2.4. Assume there exist positive constants y < +oo, e < 1 such that Y 0 , Yq < y and \ Y$ — Yq\ < e. Then, there 
exist a coupling (Y t , Y t ) t > 0 of (p t iFt)t> o an d an explicit positive constant C (y, e) < +oo such that, for all t > 0, 



q0--p). 



P(t >t)< C(y, e) 


( 11 ) 
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Proof. First, assume that Yq and Yo are deterministic, and denote by y = Yq , e = Yq — y. We assume w.l.o.g. that £ > 0. 
Let T (resp. T) be the first jump time of the process Y (resp. Y). Following the heuristics of Figure 1, it is straightforward 
that 

Yt = Yt T = — log + e p ^ . 

Easy computations lead to 

P log (e + e^'j < s^j = P (^T < i log (e ps - e)^j = 1 - (s,e), 


with 


®v(s,e) = exp — 


q (1-p 


V \ V 


log(e ps - e) + (y + e) ( 1 - 


e ps _ £ 


As a consequence, the random variables T and 4 log(£ + exp(pT)) admit densities w.r.t. the Lebesgue measure, which 
are respectively f y (-,0) and f y (-,e), with, for all s > 0 , 


fy(s,e) = 


qe 


ps 


1 - P , p(y + e) 


p \eP s — e (e ps — e ) 2 


®y(s,e)- 


Let T and 4 log(e + exp(pT)) follow the 7 -coupling (the coupling minimizing the total variation of their laws, see [30]). 
It is not hard to deduce from the very construction of this coupling the following equality: 

p ( T = ~ lo s { £ + ePT ) ’ T < *) = / °) A /?/( s > £ ) ds i 

where x A y = min(:r, y ) and then 

P (T < t) = P (Yt =Yt S j>p(r=j ) log (e + e pf ) ,T<tj 

> 1 - \ 0) + ® y (t, e) + J | f v (s, 0 ) - f y (s, e)|ds^ . 

The following upper bound is easily obtained, for any 0 < £ < e and any 0 <y<y: 


( 12 ) 


®y(s,e) < Cl exp ( - 


9(1 ~p) 


P 


(13) 


with Ci = exp l°g(l — e) + (y + e) ^1 — ^ j . In order to apply the mean-value theorem, we differenti¬ 

ate f y with respect to e. After some computations, one can obtain the following upper bound: 


dfy 

de 


(s,£) 


< C 1 C 2 exp [- g(1 P K 


( 14 ) 


where 


g 2 ((i ~~p)(i — £ ) + p{y + £ )) (, ( ( 2 -p)(i-£) + y + £ \\ 

P 2 { l -£) 2 V V ( l -£) 2 )) 

9(1 - £ + 2 p(y + £)) 


p(l — £) 3 
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Then, we easily have 


rt r+oo 

/ \fy(s,0)-fy(s,£)\ds<C 1 C 2 e exp 

Jo Jo 


1,(1 - p) A ds < pC ' C \l. 


io jo \ p J q 0 ~p) 

Combining Equations (12), (13), (14) and (15), and denoting by C(y ,£) = C\ + 2q(l-p) ’ we ^ ave 

P{t >t) < C(y,e) fexp (——A + , 


(15) 


then (11) is straightforward. This upper bound does not depend on Yq and Fo so this result still holds for random starting 
points, provided that they belong to [0, y\. □ 

Proposition 2.1 and Lemma 2.4 are the main tools to prove the following result: 

Proposition 2.5. Let to > 0. There exists an explicit positive constant K < +oo such that, for all t > f 0 > 


\\pt ~ Mt||TV < Ke vt , with v = 


p-q 


2 + 


p(p-t) ' 

9(1 -p) 


(16) 


Proof. Let a € (0,1) and u > 0. We use the coupling from Proposition 2.1 in the domain [0, at] and the coupling from 


Lemma 2.4 in the domain [at, t]. We set e = e ut and y = p[p_ P J^ + 1, and have the following inequality: 

P(t < t) > P (\Y at - Y at | < e, Y at V? a( <jjp|r<( \Y at - Y at \ < e, Y at V Y at < y^j . 


(17) 


On the one hand, 


( \Y at - Y at | > £ or Y at VY at >y)<P (\Y at -Y at \> £) +P (Y at >y) + P ( Y at > yj 


< 




with C 3 = (Wi{p 0 , po) + E[5o + Fo]) ■ 
from Lemma 2.4, 


< C 3 exp((u - a{p - q))t), 

On the other hand, let C 4 = sup 4>t C(y, e~ ut ). The constant C 4 is finite and, 


(t > t | Y at - Y at | < £, Y at V Y^ <y) <C 4 ut + exp 


q(l-p)(l - a) 


Now, (17) reduces to 


P(t > t) < 1 — (1 — C 3 exp ((u — a(p — q))t )) ( 1 — C 4 ( e ut + exp ( — 

We optimize the rate of convergence by setting 

1 

a = 


qO -p)(l - a). 


1 + 


p(p-q) 

29(1 -p) 


u = —— — =«as defined above. 


Then, (16) holds with K = C 3 + 2C4. 


□ 
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3. Long time synchronization of large populations of interacting noisy rotators 

3.1. The model 

Synchronization phenomena are a subject widely studied in physics and natural sciences. Synchronization can occur 
in several different situations, for example in the case of interacting cardiac cells, neurons, metronomes... (see [34] for 
numerous examples of synchronization phenomena). To construct a mathematical model in which a synchronization 
phenomenon occurs, one may consider a population of dynamical systems that interact with each other, and may perturb 
these systems with noise (to modelize the internal noise of each dynamical system, or the noise given by the interaction 
of each system with the surrounding environment). We will focus here on a model given by a population of N interacting 
noisy rotators. Each rotator is defined by a phase '-Pj(t) £ T = [R/27rZ, and the evolution of these phases is given by the 
following system of stochastic differential equations: 



(18) 


where K > 0 and er > 0 are two constant parameters, and is a family of standard independent Brownian 

motions. The interaction is of mean field type: the rotator ipj interacts with all the other rotators, and the interaction term is 
constituted of the sum of the contributions given by each one of these other rotators, with the same weight K/N. Remark 
that the model is invariant by rotation: if {<Pj{t))j=i,...,N is a solution of (18), then it is also the case of +c)j—i t ... t N 
for all real constant c. 

This model is known in the physics literature as mean field plane rotors model, and is a particular case of the celebrated 
noisy Kuramoto model (when the disorder follows the trivial distribution do)- Of course, since the dynamics of each 
isolated system is very simple in this model (a Brownian motion on a circle), its aim is not to describe a real phenomenon 
(to do this one would need some complex isolated system, in higher dimension and with several parameters), but to 
provide a simple framework in which one can study analytically a synchronization phenomenon. 

In this model we can speak of synchronization if the rotators concentrate around some phase, which we will call center 
of synchronization. This may happen if the interaction is strong enough with respect to the noise (since this later one 
incites the rotators to move independently), or in other words if K is sufficiently large compared to a. But with a simple 
time change one can replace a by 1, and K by K/a 2 . So the real parameter of the model is K/a 2 . and we will set cr = 1 
in the remaining for simplicity. 

We will focus on the behavior of model in the limit of infinite size of population. We will first consider the evolution on 
time intervals [0, T] independent from N, when N goes to infinity. We will then study the behavior of the model on longer 
time intervals, by making a rescaling in time depending on N. The content of the first part is based on the works [4,16], 
while the second part describes the result proved in [5]. 

3.2. Large populations and fixed time intervals 

We consider in this section the evolution on intervals of the type [0, T], with T independent from N. In this case, 
since the coefficients in (18) are smooth, we can apply to our model the classical results of the well-known theory of 
propagation of chaos [14,36]. Let us consider the empirical measure pN.t associated to the model, i.e. the Ml \ (T)-valued 
process pN,t = Xii'Li where yVf i(T) denotes the space of probability measures on T. If the initial condition 

Pn,o converges weekly to some po £ Ml\, then the process /Ltjv.t converges weakly on [0, T] to the deterministic trajectory 
on M! \ solution of the following Fokker-Planck type partially differential equation: 


dtPt(0) = \d 2 ePt{0) - d e [pt(0)J *p t (9)] , 


(19) 


where * denotes the convolution operator and J{6) = —K sin ((f). In this limit p t is the distribution of the infinite 
population of rotators on T at time t. The mass is preserved by this evolution, and due to the presence of the Laplacian 
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term this PDE admits a unique solution for any initial condition p (J £ A4 1 , which admits a smooth and positive density 
for all i > 0 (that we will also denote pt(9) for simplicity), element of C°°((0, T) x T, IR) (see for example [4] for a proof 
of this regularity result). Remark that the invariance by rotation of the finite size model is conserved in the limit, since 
one can easily check that if pt(9) is solution of (19), then it is also the case of p t {9 — c) for all c £ OR. 

A pleasant property of this PDE is that one can compute all its stationary solutions in a semi-explicit way: q is a 
(probability) stationary solution for (19) if and only if q can be expressed as 

g2iO cos (6—tjj) 

q{9) = qtp,r{9) e 2Kr cos (B'-i/j) fiQi ’ 

for some ^ £ IR and some solution r > 0 of the fixed point problem 

r = V(2Kr), (21) 

where the fixed point function ip is known explicitly, and satisfies some nice properties that allow us to determine the 
number of solutions of the fixed point problem according to the value of K: T 1 is strictly concave and bounded by 1 on 
(0, oo), and satisfies 4/(0) = 0 and ^'(O) = 1/2 (see [4] for more details). 

First remark that the equality 4/(0) = 0 implies that r = 0 is always solution to the fixed point problem (21), which 
means that the uniform probability on the circle q(9) = —^ is always a stationary solution. This solution corresponds to a 
total absence of synchronization in the model, since in that case the population of rotators is distributed uniformly on the 
T. Moreover, the fact that 4/'(0) = 1/2 and the strict concavity of 4/ imply that if K < 1, r = 0 is the only solution of 
(21). When K < 1 the interaction is too weak to allow the apparition of synchronized states. 

On the other hand, if K > 1, then there exists a unique positive solution tk to the fixed point problem, which means 
that in that case the set of stationary solutions of (19) is composed of and a whole family M of non-trivial stationary 
solutions, defined as 

M = {q^ , ip el} , where q^(9) = q^, rK (9). (22) 

Each stationary state q v , is the translation by an angle ip of the profile go- which corresponds to a concentration of the 
rotators around the phase 0 (see figure 3.2). So q,j, corresponds to a synchronization of the rotators around the center 
of synchronization ip. From a geometrical point of view M is a closed curve (in fact a circle since all its points are the 
translation of the same profile) of synchronized stationary solutions, parametrized by their centers of synchronization. 

The next step in the understanding of the model is to determine the local stability of these different stationary states. 
To do this let us linearize the evolution around these stationary states. If we rewrite (19) as dtPt = F(p t ), with F(p) = 
\p" ~ (pJ * p)', and consider a stationary state q and a smooth function u satisfying f T u = 0, we can expand F(q + u) 
as follows: 

F(q + u) = F(q) + —u"(9) — \q{9)J * u{9) + u(9)J * q(9) + u(9)J * u(9)] ’. (23) 

F(q) = 0 since q is stationary, and by just keeping the linear terms we get a linearized evolution <9 t u = L q ut around the 
stationary profile q, with 

L qU {0) := \u"(9) - [q(0)J * u{9) + u{6)J * q(9 )]'. (24) 

The spectral properties of the operator L q determines the behavior of the solutions of (19) in the neighborhood of q. Its 
spectrum can be obtained easily when q = ^ since in that case Lj_ can be decomposed on the Fourier basis: (24) boils 
down to 

L_l u(9) = ±u"(9)-±[J*u{9)]’, (25) 

and by writing u(9) = Y2T=i ak cos (k9) + Y^kLi &fc sin(A;0) one simply gets (recall that J(6) = —K sind): 

1 1 1 °° 1 °° 

Lj_u(9) = — — (1 — K)a\ cos 9 — — (1 — K)b\ sin0 — — V'' k 2 ak cos (k9) — — k 2 bk sin (k9 ). 

Z Z Z ^ J Z ^ J 

k—2 k—2 


(26) 
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Figure 2. Graph of q-1.5, qo, qi, <72 (from the left to the right) when K = 2. The elements of M are 
the translations of the synchronized profile qo . 

This shows that when K < 1 the spectrum of L_i_ is composed of strictly negative eigenvalues, and thus is stable for 
the linearized evolution. When K > 1 the directions given by cos 0 and sin 0 become instable. 

The study of the spectrum of L qip for gy, G M when K > 1 is more involved, and requires the use of weighted Sobolev 
spaces. In [4] the authors show that in a well chosen weighted space the L q . can also be decomposed on an orthogonal 
basis, and that the spectrum is made of a decreasing sequence of non positive reals (Ai)j>o satisfying Ao = 0, A, < 0 for 
i > 1 and A, —>i^yoo — 00 . The eigenvalue 0 is associated to the eigenvector gf, which generates the tangent space of M 
at qy,. So the fact that the operator L q%p has no effect in this direction is not surprising, since in this direction the dynamics 
given by (19) is trivial. On the normal space (that is the subspace generated by the other eigenfunctions), the linearized 
dynamics is stable with a spectral gap given by the largest negative eigenvalue Ai. 

Knowing the existence of this spectral gap in the normal space, one can show that these spectral properties imply the 
local stability of the curve M for the nonlinear dynamics given by ( 1 9) (see [20] for a general proof, or [ 1 6] for a proof in 
our particular model). In other words, if po is sufficiently close to M, then there exists a phase ip such that pt — H->oo gy,. 
This shows that the synchronization is a stable phenomenon when K > 1. But this does not mean that qy, alone is stable: 
the trajectory starting from some perturbation qy, + eut of the profile gy, may converge to a profile qy with a phase ip ^ ip 
(but converging to ip when e goes to 0). 

One can in fact do far better than proving the local stability of M, by describing completely the dynamics. Using 
in particular the free energy of the system [5,16], one can prove that, when K > 1 the solutions of (19) starting from 
U = {po G (T) : f T e’ 0 d u(8) = 0} converge to while if po £ U then p t converges to a qy, € M. 

3.3. Long time behavior 

The end of the preceeding section is devoted to the long time behavior of the limit PDE (19) of the model, that is the 
limit in time of the model when the size of the population has already been sent to infinity. It is the long time behavior of 
a deterministic system, given by the limit PDE. But when N is large but finite, there is still noise in the system. So, in that 
case, even if the model is very close to the limit PDE for some finite interval of time, its behavior can differ dramatically 
from the one of the limit PDE in very long times, due to the presence of this noise. 

Since the noise present in the system disappears when N goes to infinity, one can see in a certain sense the model 
with N large as a noisy perturbation of the limit PDE. So when K > 1 and the empirical measure pN,t is close to a 
synchronized profile qy, 6 M the dynamics of the finite sized model is given by a competition between the noise and 
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the operator L qqi , since this operator dominates the limit dynamics in the neighborhood of <i v ,. L q , a induces a negative 
feedback on the normal space, so in this direction the process has the same behavior as an infinite version of an Ornstein 
Uhlenbeck process d p t = —A p t + y/adB t with a small. So the empirical measure can not go far away in this direction, 
and stays close to M. On the other hand L qqj has no effect on the tangent space of M at and thus the process can 
diffuse in this direction. This means that when N is large pN,t stays close to some q^N, ip^ being some random phase 
trajectory. Of course this random phase ip^ converges to a constant on finite time intervals (since the process converges 
to a solution of the limit PDE) and thus to see a macroscopic effect of the noise in the limit one needs to rescale the time. 
Since the noise is of size ,Jt]N, the appropriate renormalization is to look at times of order N. It is the purpose of the 
following Theorem, which corresponds to Theorem 1.1 in [5], We denote || • | j the fT -1 norm. 

Theorem 3.1. Suppose that K > 1 and let Tf > 0 and ipo € T. If for all e > 0 


lim P 

N—> oo 


UNfi - Ityo II -1 s 


< e = 1. 


(27) 


then there exists a continuous process (tp ^) 
that for all e > 0 


->o adapted to the filtration generated by the sequence (f?^r.)j=i,...,Ar such 


lim P 

N—> oo 


sup 

tG[0,t/] 


\Pn,tN — q^p || _i s 


< e = 1. 


(28) 


and such that ip^ converges weakly to 'ipo + Dj<Wt, where ( Wt)t>o I s a standard Brownian motion and Dk is a constant 
that can be computed explicitly (in terms of the positive solution tk of the fixed point problem (21)). 

The proof given in [5] of this Theorem is based on a discretization of the dynamics on an intermediate time-scale, and 
on a projection of the empirical measure on M at each time step to follow the fluctuations of the center of synchronization 
induced by the noise. This procedure is inspired from the works [3,8], where the authors show the diffusive behavior of the 
phase boundary for a one dimensional reaction-diffusion model with bistable potential and perturbed with a white noise. 
This discretization done, one of the main difficulties in the proof is then to show that the dynamics of this discretized phase 
dynamics does not contain any drift term in the limit N —> oo, and this is obtained with use of the symmetry properties 
of the model: its invariance by rotation has already been pointed out before, but one can also remark that if pt(0) is a 
solution of (19), then it is also the case of pt{—9). See [5] for more details. 

4. Wasserstein stability of traveling waves for scalar nonlinear 

ADVECTION-DIFFUSION EQUATIONS 

This section addresses the long time behaviour of the scalar nonlinear advection-diffusion equation 


d t u + d x {B(u )) = ^d 2 (A(u )), 


t > o, x e R, 


(29) 


where A and B are C 1 functions, and A'(u) = o 2 (u) > 0. In particular, we aim at illustrating, and extending to 
Wasserstein distances, the classical L 1 stability results of traveling waves of (29), which go back to Osher and Ralston [33] 
as well as Freistiihler and Serre [12]. We will use probabilistic arguments, based on the interpretation by Jourdain and 
coauthors [23, 25, 26] of (29) as the Fokker-Planck equation of a nonlinear diffusion process. The connection between the 
long time behaviour of this process and the traveling waves of (29) was recently pointed out to the author of this survey 
by Julien Vovelle, to whom warm thanks are due. 

4.1. Traveling waves and stationary solutions to (29) 

When a 2 is constant, the equation (29) is a viscous scalar conservation law. A (bounded) traveling wave for this 
equation is a function 6 solving 


b' = B{cp) - scp~ q, 


lim cp(x) = ur 

x —>-±oo 


(30) 
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where the speed of the wave s and w ^ w + satisfy the Rankine-Hugoniot condition 

B(w+)-B(w~) 

S = -T- --, (31) 

— W 

which implies q = B(w±) — sw ± . In the sequel, the boundedness will be implicitly assumed and therefore cj> will only 
be referred to as a traveling wave. 

By the Cauchy-Lipschitz Theorem, for a solution to (30) to exist, it is necessary and sufficient that q ^ B(w) — sw 
for w strictly between w~ and w + , which is usually called the Oleinik Zt-condition [32], Under this condition, all the 
traveling waves are translations of each other. If w~ < w + , the Oleinik Zs-condition rewrites 


Vw £ (w , w + ), 


B(w ) — B(w ) B{w + ) — B(w ) 

w — w~ w+ — w~ 


(32) 


that is to say the graph of B remains strictly above the line segment joining the points (w ,B(w )) and (w + , B(w + )), 
and it implies that </>' > 0 on [R, so that increases from w~ to w + . If w~ > w + , then the graph of B must remain strictly 
below this line segment, and cj> decreases from w~ to w + . 

If is a traveling wave of (29), it is immediate that u(t, x ) := (f>(x — st ) solves (29). If s = 0, then u is a stationary 
solution to (29). It has been known since the works by IT in and Oleinik [21,22] that traveling waves describe the long 
time behavior of solutions to (29). The following L 1 stability theorem is due Freistiihler and Serre [12] and is based on a 
former result by Osher and Ralston [33]. 

Theorem 4.1 (Freistiihler and Serre). Let <f> be a traveling wave of (29) and uq such that uq — <f> £ L 1 (R). The solution 
u(t, x) of (29) with initial datum uq satisfies 


lim ||u(f, •) - - st + <5)|| l i ( ir) = 0, 

£->-+oo 


where the phase shift S is defined by 


5 


(uo(x) — fi(x)) dau 


/ ikEIR 


Note that the definition (34) of 5 ensures that 


(33) 


(34) 




(uo(x) — <j>(x + 5)) dx = 0. 


(35) 


For a nonconstant diffusion coefficient a 2 , the equation (30) defining a traveling wave has to be replaced with 

7 :(+</>))' = - sfi-q, lim <+r) = w ± , (36) 

Z x—f±oo 

where s is still defined in terms of w~ f w + by the Rankine-Hugoniot condition (31). Under the uniform ellipticity 
condition inf,, er 2 (u) > 0, the Oleinik /-’-condition remains necessary and sufficient for a traveling wave to exist, and 
traveling waves remain monotonic on the real line. Gasnikov [15] proved that, if A and B are C 4 on [m;~ , vf (or 
[u> + , w~]), then Theorem 4.1 holds without any change in its statement. 

4.2. Probabilistic interpretation of (29) 

Traveling waves to (29) with w~ = 0 and w + = 1 can be interpreted as cumulative distribution functions (CDFs) 
of probability measures on the real line. If we assume that the initial datum uq of (29) is also the CDF of a probability 
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measure m, then u(t, •) remains the CDF of a probability measure P t at all times. Besides, taking the space derivative 
of (29) yields the formal evolution equation 

d t P t = \d 2 x (a 2 (H * P t {x))P t ) - 8 X (b(H * P t (x))P t ) (37) 

for P t , where b := B' and H * P t = u(t. ■) refers to the spatial convolution of P t with the Heaviside function. The 
equation (37) is the Fokker-Planck equation of the diffusion process 

r dXt = b(H * P t (X t )) df + a{H * P t (X t )) dWt, 

\ Pt is the law of X t , 


where (Wt)t>o is a standard real Brownian motion, and .Y 0 is distributed according to m, independently of (Wt)t>o- 
Note that the coefficients of this stochastic differential equation depend on the law P t of X t , which is the trace of the 
nonlinearity of the Fokker-Planck equation (37). Therefore, the process (Xt)t>o is said to be nonlinear in McKean’s 
sense. 

The existence and uniqueness of the nonlinear process (X t )t>o were established in [26] under the assumptions that b 
and a 2 are continuous on [ 0 , 1 ], m have a finite first order moment, er 2 (u) > 0 on ( 0 , 1 ) and: 

• if <r 2 (0) = 0, then uq(x) > 0 for all x € IR; 

• if <t 2 ( 1 ) = 0 , then uq(x) < 1 for all x € IR; 
where uq := H * m is the CDF of the initial distribution m. 

Remark 4.2. The assumption on the finiteness of the first order moment of m is natural in order to obtain L 1 stability 
results on the solution to (29). Indeed, in general, if F\ and F 2 are the CDFs of probability measures mi and m 2 on IR, 
then 11 Fi — F 2 \ | m (r) need not be finite, but if we assume in addition that the first order moments of mi and m 2 are finite, 

then | |Fi — F2 \ II 1 (r) < + 00, and the difference between the expectations of mi and m2 is given by 

/ xm\{dx) — / xm 2 (dx) = / (Fi(x) — F 2 {x)) dx. (39) 

«/ aiGR J x£\R J x(z\R 

We first provide a probabilistic interpretation of the speed s of a traveling wave as the average velocity of the nonlinear 
process (X t )t> o- Indeed, the expectation of X t satisfies 


E[X t ] = E[X 0 ] + f E [b(H * P S {X S ))} ds. (40) 

Js =0 

Besides, it was proved in [26] that, ds-almost everywhere, the measure P s does not weight points, which implies that 
H * P S (X S ) is uniformly distributed on [0,1]. We therefore rewrite 

E[X t ] = E[A' 0 ] + f f b{u) duds = E[X 0 ] + st, (41) 

J s— 0 J u— 0 

where s is given by the Rankine-Hugoniot condition s = B(l) — /i(0). 

We now describe the long time behaviour of the nonlinear process in terms of traveling waves. We first discuss 
conditions ensuring that traveling waves are well defined thanks to the following lemma, which was obtained in [26, 
Proposition 4.1 and Corollary 4.4] by solving (36) explicitly. 

Lemma 4.3. Assume that b and a 2 are continuous on [0,1], that A is increasing on [0,1], and that the Oleinik E-condition 

Vu S (0,1), B{u) > B{ 0) + su (42) 


is satisfied, where s is defined by the Rankine-Hugoniot condition s = -B(l) — -B(O). 
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Then there exists a traveling wave <f> increasing from 0 to 1, and all such traveling waves are translations of each other. 
Besides, the probability measure with CDF <j> has a finite first order moment if and only if 


, 1/2 


2 {u) 


f 1 (1 — u)o 2 {u) 

=o B i u ) ~ B (®) ' 7«=i/2 B i u ) ~ B (0) - su 


■ dw 


du < +oo. 


(43) 


Let us mention that, if the Oleinik //-condition is relaxed by allowing that B(u) = B( 0) T su for some u £ (0,1) such 
that er 2 (u) = 0, then one can exhibit traveling waves that are not translations of each other, see [26, Remark 4.2]. 

Let us now fix a probability measure m with finite first order moment, and let (l> be given by Lemma 4.3. If (43) holds, 
then by Remark 4.2, uq — <t> £ L 1 (R); besides, choosing the phase shift § in order to satisfy (35) amounts to selecting 
Uoo = </>(• + 6 ) having the same expectation as uq. By (41), we already know that the expectation of X t — st is constant 
and equal to the expectation of u^. Theorem 4.1 contains the much stronger statement that the long time behaviour of 
this process is described by the stationary wave Uoo, in the sense that the CDF of X t — st converges to in L 1 (IR). In 
the next subsection, we extend this result to Wasserstein distances. 


4.3. Contraction and convergence to equilibrium in Wasserstein distance 

We recall that the Wasserstein distance of order p £ [ 1. +oo) between two probability measures on the real line with 
respective CDFs F\ and F 2 is given by 


w P (F 1 ,F 2 ) 


(f \FrHv)-Ft 1 (w) rdtu) ' 


(44) 


where the pseudo-inverse F 1 of a CDF F is defined by F 1 {w) := inf{x £ IR : F{x) > u>} for all w £ (0,1). Note 
that, in particular, 

W 1 (C 1 ,C 2 ) = ||F 1 -C 2 || L 1 (K) . (45) 

Our first result is the following Wasserstein contraction property of (29). Given two CDFs uq, Vo on the real line, we 
now denote by ut := u(t, ■) and v t := v(t, •) the corresponding solutions to (29). 

Proposition 4.4. Assume that b and a 2 are continuous on [0,1], that A is increasing on [0,1] and that m has a finite first 
order moment. For all p £ [1, +oo), 

• ifW p (uo,vo) = Too, then W p (ut,vt) = Too for all t > 0; 

• ifW p (uo, vo) < Too, then 1 1 —> W p (ut, vf) is nonincreasing on [0, Too). 

The proof of Proposition 4.4 is detailed in [26, Proposition 3.1]. It is entirely probabilistic, and relies on a coupling 
argument for the order statistics of a system of mean-field interacting particles approximating ut and vt. We note that, 
by (45), the case p = 1 of Proposition 4.4 is nothing but the classical L 1 stability estimate 


Vf > 0, IK - wIIl^r) < ||«o - v 0 ||l 1 (r) (46) 

for (29). Similar Wasserstein estimates were obtained for scalar conservation laws, that is to say a 2 = 0 in our setting, by 
Bolley, Brenier and Loeper in [ 6 ], 

Assuming classical regularity for u and v, one can actually go deeper into the description of the evolution of W p (ut ■ v f ). 
Indeed, observing that the pseudo-inverse uf 1 (w) of the CDF ut satisfies the equation 

dtuf 1 = b(w) -d w f a , (47) 

\ 2 d w u t ) 

one can derive the explicit formula for the time derivative of W p (ut, v t )'■ 


Pip - 1 ) 




— 1 m —2 


w^t 9wV t 


0 W Uf 0 W Vf 


■ d w. 


±w p ( Ut ,v t y = 


2 


W—O 


(48) 
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This leads to the following convergence theorem, which is the main result of [26] and readily extends Theorem 4.1 to 
Wasserstein distances. 

Theorem 4.5. Assume that: 

• b is C 1 ^ on [0,1], cr 2 is positive and C 2+ ^ on [0,1], and the equilibrium conditions (42) and (43) hold; 

• the probability measure m has a finite first order moment; 

• the Wasserstein distance of order 2 between uq := H * m and any traveling wave <fi increasing from 0 to 1 is 
finite. 

Let us denote by Uqq the traveling wave with the same expectation as Uq. Then, for all p > 2 such that W p (uo,itoo) < 
+oo, we have 

VI < q < p, lim W q (u(t, •), Uoo{- - st)) = 0. (49) 

£—>■+00 


4.4. Conclusion 

We have interpreted the scalar nonlinear advection-diffusion equation (29), with a CDF as an initial datum, as the 
Fokker-Planck equation of a nonlinear diffusion process (Xf)t> o. The expectation of this process evolves linearly in time, 
at a velocity given by the speed of traveling waves of (29) increasing from 0 to 1. Under the Oleinik E-condition (42), 
Theorem 4.1 shows that the fluctuation of Xt around st converges, in L 1 ([R), to an equilibrium distribution described by 
the traveling wave having the same expectation as X t — st. Theorem 4.5 extends this result to Wasserstein distance. 

The probabilistic interpretation of (29) can also lead to further developments on Theorems 4.1 and 4.5. For example, in 
the case of a constant diffusion coefficient cr 2 , an exponential rate of decay to equilibrium for X t — st was obtained in [25], 
for initial distributions close to the equilibrium distribution. The decay was expressed in \2 distance, and using the trans¬ 
port chi-square inequality of [24], it can be translated in quadratic Wasserstein distance. We refer to [26, Subsection 4.3] 
for details in this direction. 
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